Papers
arxiv:2407.07726

PaliGemma: A versatile 3B VLM for transfer

Published on Jul 10, 2024
· Submitted by akhaliq on Jul 11, 2024
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,

Abstract

PaliGemma, a versatile Vision-Language Model based on SigLIP-So400m and Gemma-2B, demonstrates strong performance across numerous open-world tasks, including specialized areas like remote sensing and segmentation.

AI-generated summary

PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.

Community

Paper submitter

Screen Shot 2024-07-10 at 10.55.19 PM.png

also read hf.co/blog/paligemma

are the finetuned models going to be available on huggingface?

Sign up or log in to comment

Models citing this paper 166

Browse 166 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 79

Collections including this paper 24